Skip to content

Conversation

shaun-nx
Copy link
Contributor

@shaun-nx shaun-nx commented Oct 15, 2025

Proposed changes

In #4004, we updated several files to use moduels/ instead of /usr/lib/nginx/modules as NGINX creates a symlink from /etc/nginx/moduels to the appropriate modules directory based on the operating system.

That change was done to ensure consistent behavior between images built on different operating systems.

This change ensure we do the same for the new epp.njs module

Checklist

Before creating a PR, run through this checklist and mark each as complete.

  • I have read the CONTRIBUTING doc
  • I have added tests that prove my fix is effective or that my feature works
  • I have checked that all unit tests pass after adding my changes
  • I have updated necessary documentation
  • I have rebased my branch onto main
  • I will ensure my PR is targeting the main branch and pulling from my branch from my own fork

Release notes

If this PR introduces a change that affects users and needs to be mentioned in the release notes,
please add a brief note that summarizes the change.


sjberman and others added 9 commits September 18, 2025 15:37
Problem: To support the full Gateway API Inference Extension, we need to be able to extract the model name from the client request body in certain situations.

Solution: Add a basic NJS module to extract the model name. This module will be enhanced (I've added notes) to be included in the full solution. On its own, it is not yet used.
This commit adds support for the control plane to watch InferencePools. A feature flag has been added to enable/disable processing these resources. By default, it is disabled.

When an HTTPRoute references an InferencePool, we will create a headless Service associated with that InferencePool, and reference it internally in the graph config for that Route. This allows us to use all of our existing logic to get the endpoints and build the proper nginx config for those endpoints.

In a future commit, the nginx config will be updated to handle the proper load balancing for the AI workloads, but for now we just use our default methods by proxy_passing to the upstream.
Problem: In order for NGINX to get the endpoint of the AI workload from the EndpointPicker, it needs to send a gRPC request using the proper protobuf protocol.

Solution: A simple Go server is injected as an additional container when the inference extension feature is enabled, that will listen for a request from our (upcoming) NJS module, and forward to the configured EPP to get a response in a header.
Problem: We need to connect NGINX to the Golang shim that talks to the EndpointPicker, and then pass client traffic to the proper inference workload.

Solution: Write an NJS module that will query the local Go server to get the AI endpoint to route traffic to. Then redirect the original client request to an internal location that proxies the traffic to the chosen endpoint.

The location building gets a bit complicated especially when using both HTTP matching conditions and inference workloads. It requires 2 layers of internal redirects. I added lots of comments to hopefully clear up how we build these locations to perform all the routing steps.
Update the inference extension design doc to specify different status that needs to be set on Inference Pools to understand its state
…4006)

Update gateway inference extension proposal on inability to provide a secure TLS connection to EPP.
Add status to Inference Pools

Problem: Users want to see the current status of their Inference pools

Solution: Add status for inference pools
Proposed changes
Problem: Want to collect number of referenced InferencePools in cluster.

Solution: Collect the count of referenced InferencePools.

Testing: Unit tests and manually verified collection via debug logs.
@shaun-nx shaun-nx requested a review from a team as a code owner October 15, 2025 12:45
@github-actions github-actions bot added the chore Pull requests for routine tasks label Oct 15, 2025
Copy link
Contributor

@tataruty tataruty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

strange lint errors though

@salonichf5
Copy link
Contributor

strange lint errors though

we need to rebase this branch with main. Its pulling old charts

@salonichf5 salonichf5 force-pushed the feat/inference-extension branch from 2a4c8d7 to 52fb31b Compare October 15, 2025 18:30
@salonichf5 salonichf5 requested a review from a team as a code owner October 15, 2025 18:30
@tataruty tataruty self-requested a review October 16, 2025 07:10
@tataruty
Copy link
Contributor

something happened to branch, it got too many old commits

Copy link
Contributor

@tataruty tataruty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix branch

@github-project-automation github-project-automation bot moved this from 🆕 New to 🏗 In Progress in NGINX Gateway Fabric Oct 16, 2025
@shaun-nx
Copy link
Contributor Author

Some of the rebasing conflicts are taking too long to resolve so I'm just going to make a fresh PR for this

@shaun-nx shaun-nx closed this Oct 16, 2025
@github-project-automation github-project-automation bot moved this from 🏗 In Progress to ✅ Done in NGINX Gateway Fabric Oct 16, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

chore Pull requests for routine tasks

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

5 participants